Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migration,*: onboard TruncatedAndRangeAppliedStateMigration #57694

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Dec 8, 2020

This PR onboards the first real long-running migration using the
infrastructure we've been building up within pkg/migration. It was
originally prototyped in #57445.

TruncatedAndRangeAppliedStateMigration (which also happens to be a
below-Raft one) lets us do the following:

i. Use the RangeAppliedState on all ranges
ii. Use the unreplicated TruncatedState on all ranges

In 21.2 we'll finally be able to delete holdover code that knows how to
handle the legacy replicated truncated state.

Release note(general change): Cluster version upgrades, as initiated by
SET CLUSTER SETTING version = <major>-<minor>, now perform internal
maintenance duties that will delay how long it takes for the command to
complete. The delay is proportional to the amount of data currently
stored in the cluster. The cluster will also experience a small amount
of additional load during this period while the upgrade is being
finalized.

Release note: None


Only the last commit here is of interest. All prior commits are from #57662,
#57650 and #57366.

Makes for better logging.

Release note: None
We can separate out the `Helper`, `Migration`, and various utilities
into their own files. We'll add tests for individual components in
future commits; the physical separation here sets the foundation for
doing so (prototyped in cockroachdb#57445). This commit is purely code movement.

Release note: None
It's clearer to talk explicitly in terms of causality.

Release note: None
We re-define what the Migration type is to be able to annotate it a
description. We'll later use this description when populating the
`system.migrations` table (originally prototyped in cockroachdb#57445).

Release note: None
We make it a bit more ergonomic (this revision was originally
prototyped in cockroachdb#57445).

Release note: None
To facilitate testing `Helper` in isolation, we introduce a `cluster`
interface that we'll mock out in tests. It's through this interface that
the migration infrastructure will able to dial out to a specific node,
grab hold of a kv.DB instance, and retrieve the current cluster
membership.

Part of diff also downgrades `RequiredNodes` from being a first class
primitive, instead tucking it away for internal usage only. Given
retrieving the cluster membership made no guarantees about new nodes
being added to the cluster, it's entirely possible for that to happen
concurrently with it. Appropriate usage then entailed wrapping it under
a stabilizing loop, like we do so in `EveryNode`. This tells us there's
no need to expose it directly to migration authors.

Release note: None
It's not currently wired up to anything. We'll use it in future PRs to
send out `Migrate` requests to the entire keyspace. This was originally
prototyped in cockroachdb#57445. See the inline comments and the RFC (cockroachdb#48843) for
the motivation here.

Release note: None
@irfansharif irfansharif requested review from knz and tbg December 8, 2020 03:30
@irfansharif irfansharif requested a review from a team as a code owner December 8, 2020 03:30
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif changed the title migration.*: onboard TruncatedAndRangeAppliedStateMigration migration,*: onboard TruncatedAndRangeAppliedStateMigration Dec 8, 2020
@irfansharif irfansharif force-pushed the 201207.truncated-state-migration branch from 6c0acc7 to 95fa479 Compare December 8, 2020 03:31
This command forces all ranges overlapping with the request spans to
execute the (below-raft) migrations corresponding to the specific,
stated version. This has the effect of moving the range out of any
legacy modes operation they may currently be in. KV waits for this
command to durably apply on all the replicas before returning,
guaranteeing to the caller that all pre-migration state has been
completely purged from the system.

We're currently not wiring it up to anything. We will in a future commit
that introduces the truncated state migration. This commit was pulled
out of our prototype in cockroachdb#57445.

Release note: None
We introduce two new RPCs for the migration infrastructure to use.
`GCReplicas` will be used to instruct the target node to process all
GC-able replicas. `FlushAllEngines` will be used to instruct the target
node to persist all in-memory state to disk. Both of these are necessary
primitives for the migration infastructure.

Specifically this comes up in the context of wanting the ensure that
ranges where we've executed a ranged `Migrate` command over have no way
of ever surfacing pre-migrated state. This can happen with older
replicas in the replica GC queue and with applied state that is not yet
persisted. We elaborate on both of these below:

Motivation for `GCReplicas`: Currently we wait for the `Migrate` to have
applied on all replicas of a range before returning to the caller. This
does not include earlier incarnations of the range, possibly sitting
idle in the replica GC queue. These replicas can still request leases,
and go through the request evaluation paths, possibly tripping up
assertions that check to see no pre-migrated state is found. For this
reason we introduce the `GCReplicas` RPC that the migration manager can
use to ensure all GC-able replicas are processed before declaring the
specific cluster version bump complete.

Motivation for `FlushAllEngines`: Like we mentioned above, KV currently
waits for the `Migrate` command to have applied on all replicas before
returning. With the applied state, there's no necessity to durably
persist it (the representative version is already stored in the raft
log). Out of an abundance of caution, and to really really ensure that
no pre-migrated state is ever seen in the system, we provide the
migration manager a mechanism to flush out all in-memory state to disk.
This way the manager can guarantee that by the time a specific cluster
version is bumped, all pre-migrated state from prior to that cluster
version will have been fully purged from the system.

---

The ideas here follow from our original prototype in cockroachdb#57445. Neither of
these RPCs are currently wired up to anything. That'll happen in a
future commit introducing the raft truncated state migration.

Release note: None
@irfansharif irfansharif force-pushed the 201207.truncated-state-migration branch from 95fa479 to 36b37f3 Compare December 8, 2020 03:37
This PR onboards the first real long-running migration using the
infrastructure we've been building up within pkg/migration. It was
originally prototyped in cockroachdb#57445.

TruncatedAndRangeAppliedStateMigration (which also happens to be a
below-Raft one) lets us do the following:

  i. Use the RangeAppliedState on all ranges
  ii. Use the unreplicated TruncatedState on all ranges

In 21.2 we'll finally be able to delete holdover code that knows how to
handle the legacy replicated truncated state.

Release note(general change): Cluster version upgrades, as initiated by
SET CLUSTER SETTING version = <major>-<minor>, now perform internal
maintenance duties that will delay how long it takes for the command to
complete. The delay is proportional to the amount of data currently
stored in the cluster. The cluster will also experience a small amount
of additional load during this period while the upgrade is being
finalized.

Release note: None
@irfansharif irfansharif force-pushed the 201207.truncated-state-migration branch from 36b37f3 to ef96beb Compare December 8, 2020 04:57
@tbg
Copy link
Member

tbg commented Dec 15, 2020

Intentionally holding off on the review here until the building blocks settle, but the broad strokes here look good.

@irfansharif
Copy link
Contributor Author

Superceded by #58088.

@irfansharif irfansharif deleted the 201207.truncated-state-migration branch December 22, 2020 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants